An Extensive Evaluation of Filtering Misclassified Instances in Supervised Classification Tasks

نویسندگان

  • Michael R. Smith
  • Tony R. Martinez
چکیده

Removing or filtering outliers and mislabeled instances prior to training a learning algorithm has been shown to increase classification accuracy. A popular approach for handling outliers and mislabeled instances is to remove any instance that is misclassified by a learning algorithm. However, an examination of which learning algorithms to use for filtering as well as their effects on multiple learning algorithms over a large set of data sets has not been done. Previous work has generally been limited due to the large computational requirements to run such an experiment, and, thus, the examination has generally been limited to learning algorithms that are computationally inexpensive and using a small number of data sets. In this paper, we examine 9 learning algorithms as filtering algorithms as well as examining the effects of filtering in the 9 chosen learning algorithms on a set of 54 data sets. In addition to using each learning algorithm individually as a filter, we also use the set of learning algorithms as an ensemble filter and use an adaptive algorithm that selects a subset of the learning algorithms for filtering for a specific task and learning algorithm. We find that for most cases, using an ensemble of learning algorithms for filtering produces the greatest increase in classification accuracy. We also compare filtering with a majority voting ensemble. The voting ensemble significantly outperforms filtering unless there are high amounts of noise present in the data set. Additionally, we find that a majority voting ensemble is robust to noise as filtering with a voting ensemble does not increase the classification accuracy of the voting

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Comparative Evaluation of Curriculum Learning with Filtering and Boosting in Supervised Classification Problems

Not all instances in a data set are equally beneficial for inferring a model of the data. Some instances (such as outliers) are detrimental to inferring a model of the data. Several machine learning techniques treat instances in a data set differently during training such as curriculum learning, filtering, and boosting. However, an automated method for determining how beneficial an instance is ...

متن کامل

Classification of encrypted traffic for applications based on statistical features

Traffic classification plays an important role in many aspects of network management such as identifying type of the transferred data, detection of malware applications, applying policies to restrict network accesses and so on. Basic methods in this field were using some obvious traffic features like port number and protocol type to classify the traffic type. However, recent changes in applicat...

متن کامل

Constructing ensembles of classifiers using supervised projection methods based on misclassified instances

In this paper we propose an approach for ensemble construction based on the use of supervised projections, both linear and non-linear, to achieve both accuracy and diversity of individual classifiers. The proposed approach uses the philosophy of boosting, putting more effort on difficult instances, but instead of learning the classifier on a biased distribution of the training set, it uses misc...

متن کامل

Palarimetric Synthetic Aperture Radar Image Classification using Bag of Visual Words Algorithm

Land cover is defined as the physical material of the surface of the earth, including different vegetation covers, bare soil, water surface, various urban areas, etc. Land cover and its changes are very important and influential on the Earth and life of living organisms, especially human beings. Land cover change monitoring is important for protecting the ecosystem, forests, farmland, open spac...

متن کامل

Online Passive Aggressive Active Learning and Its Applications

We investigate online active learning techniques for classification tasks in data stream mining applications. Unlike traditional learning approaches (either batch or online learning) that often require to request the class label of each incoming instance, online active learning queries only a subset of informative incoming instances to update the classification model, which aims to maximize cla...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1312.3970  شماره 

صفحات  -

تاریخ انتشار 2013